West Point, SAAVB, and BBN/AUB Arabic Speech Corpora: A Comparative Survey
نویسندگان
چکیده
The aim of this paper is to evaluate three public Arabic speech corpora, namely the West Point (WP), Saudi Accented Arabic Voice Bank (SAAVB) and the BBN Technologies/American University at Beirut (BBN/AUB) corpus by referring the TIMIT English speech corpus as benchmark. Weaknesses, strengths, and discrepancies of these Arabic corpora regarding their design and content are covered in this paper. This paper is very important to Arabic speech processing because Arabic is one of the under resourced language despite its importance and popularity. Currently, we are considering WP and BBN/AUB corpora to analyse and study Arabic rhythm in our ongoing research project. Keywords-Arabic language; TIMIT; West Point; SAAVB; BBN/AUB.
منابع مشابه
Using a Telephony Saudi Accented Arabic Corpus in Automatic Recognition of Spoken Arabic Digits
In this research, spoken Arabic digits are investigated from the speech recognition problem point of view. The system is designed to recognize an isolated whole-word speech. In the training and testing phase of this system, isolated digits data sets are taken from the telephony Arabic speech corpus, SAAVB. This standard corpus was developed by KACST and it is classified as a noisy speech databa...
متن کاملSaudi accented Arabic voice bank
The aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets, selecting the right speakers and transcribing their speech are some of the challenges that faced the project team. The procedures that met these challenges are h...
متن کاملSpeech Recognition System of Arabic Digits based on A Telephony Arabic Corpus
Automatic recognition of spoken digits is one of the difficult tasks in the field of computer speech recognition. Spoken digits recognition process is required in many applications such as speech based telephone dialing, airline reservation, automatic directory to retrieve or send information, etc. These applications take numbers and alphabets as input. Arabic language is a Semitic language tha...
متن کاملSpeech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Automatic recognition of spoken alphabets is one of the difficult tasks in the field of computer speech recognition. In this research, spoken Arabic alphabets are investigated from the speech recognition problem point of view. The system is designed to recognize an isolated whole-word speech. The Hidden Markov Model Toolkit (HTK) is used to implement the isolated word recognizer with phoneme ba...
متن کاملThe BBN Byblos 1997 large vocabulary conversational speech recognition system
This paper presents the 1997 BBN Byblos Large Vocabulary Speech Recognition (LVCSR) system. We give an outline of the algorithms and procedures used to train the system, describe the recognizer configuration and present the major technological innovations that lead to performance improvements. The major testbed we present our results for is the Switchboard Corpus, where current word error rates...
متن کامل